Overview

Dataset statistics

Number of variables14
Number of observations400
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory43.9 KiB
Average record size in memory112.3 B

Variable types

Numeric13
Categorical1

Alerts

CRIM is highly correlated with ZN and 8 other fieldsHigh correlation
ZN is highly correlated with CRIM and 4 other fieldsHigh correlation
INDUS is highly correlated with CRIM and 7 other fieldsHigh correlation
NOX is highly correlated with CRIM and 8 other fieldsHigh correlation
RM is highly correlated with LSTAT and 1 other fieldsHigh correlation
AGE is highly correlated with CRIM and 7 other fieldsHigh correlation
DIS is highly correlated with CRIM and 7 other fieldsHigh correlation
RAD is highly correlated with CRIM and 3 other fieldsHigh correlation
TAX is highly correlated with CRIM and 7 other fieldsHigh correlation
PTRATIO is highly correlated with MEDVHigh correlation
LSTAT is highly correlated with CRIM and 7 other fieldsHigh correlation
MEDV is highly correlated with CRIM and 7 other fieldsHigh correlation
CRIM is highly correlated with RAD and 1 other fieldsHigh correlation
ZN is highly correlated with INDUS and 2 other fieldsHigh correlation
INDUS is highly correlated with ZN and 7 other fieldsHigh correlation
NOX is highly correlated with INDUS and 5 other fieldsHigh correlation
RM is highly correlated with LSTAT and 1 other fieldsHigh correlation
AGE is highly correlated with ZN and 4 other fieldsHigh correlation
DIS is highly correlated with ZN and 5 other fieldsHigh correlation
RAD is highly correlated with CRIM and 4 other fieldsHigh correlation
TAX is highly correlated with CRIM and 5 other fieldsHigh correlation
LSTAT is highly correlated with INDUS and 7 other fieldsHigh correlation
MEDV is highly correlated with INDUS and 2 other fieldsHigh correlation
CRIM is highly correlated with INDUS and 5 other fieldsHigh correlation
ZN is highly correlated with INDUSHigh correlation
INDUS is highly correlated with CRIM and 3 other fieldsHigh correlation
NOX is highly correlated with CRIM and 3 other fieldsHigh correlation
RM is highly correlated with MEDVHigh correlation
AGE is highly correlated with CRIM and 2 other fieldsHigh correlation
DIS is highly correlated with CRIM and 3 other fieldsHigh correlation
RAD is highly correlated with CRIM and 1 other fieldsHigh correlation
TAX is highly correlated with CRIM and 1 other fieldsHigh correlation
LSTAT is highly correlated with MEDVHigh correlation
MEDV is highly correlated with RM and 1 other fieldsHigh correlation
CRIM is highly correlated with INDUS and 2 other fieldsHigh correlation
ZN is highly correlated with INDUS and 7 other fieldsHigh correlation
INDUS is highly correlated with CRIM and 8 other fieldsHigh correlation
NOX is highly correlated with CRIM and 9 other fieldsHigh correlation
RM is highly correlated with PTRATIO and 2 other fieldsHigh correlation
AGE is highly correlated with ZN and 7 other fieldsHigh correlation
DIS is highly correlated with ZN and 8 other fieldsHigh correlation
RAD is highly correlated with ZN and 8 other fieldsHigh correlation
TAX is highly correlated with ZN and 6 other fieldsHigh correlation
PTRATIO is highly correlated with ZN and 8 other fieldsHigh correlation
B is highly correlated with CRIM and 1 other fieldsHigh correlation
LSTAT is highly correlated with NOX and 6 other fieldsHigh correlation
MEDV is highly correlated with ZN and 9 other fieldsHigh correlation
ZN has 296 (74.0%) zeros Zeros

Reproduction

Analysis started2021-12-08 11:27:41.854202
Analysis finished2021-12-08 11:28:17.504570
Duration35.65 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

CRIM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct398
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.757190925
Minimum0.00906
Maximum88.9762
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum0.00906
5-th percentile0.028694
Q10.07782
median0.24217
Q33.5434275
95-th percentile16.89612
Maximum88.9762
Range88.96714
Interquartile range (IQR)3.4656075

Descriptive statistics

Standard deviation9.155495507
Coefficient of variation (CV)2.436792723
Kurtosis35.16111791
Mean3.757190925
Median Absolute Deviation (MAD)0.207705
Skewness5.159668357
Sum1502.87637
Variance83.82309797
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14.33372
 
0.5%
0.015012
 
0.5%
0.082651
 
0.2%
0.219771
 
0.2%
0.066641
 
0.2%
0.024981
 
0.2%
0.055151
 
0.2%
0.110271
 
0.2%
22.59711
 
0.2%
0.289551
 
0.2%
Other values (388)388
97.0%
ValueCountFrequency (%)
0.009061
0.2%
0.010961
0.2%
0.013011
0.2%
0.013111
0.2%
0.013811
0.2%
0.014391
0.2%
0.015012
0.5%
0.015381
0.2%
0.017091
0.2%
0.017781
0.2%
ValueCountFrequency (%)
88.97621
0.2%
73.53411
0.2%
67.92081
0.2%
51.13581
0.2%
41.52921
0.2%
38.35181
0.2%
37.66191
0.2%
28.65581
0.2%
25.94061
0.2%
25.04611
0.2%

ZN
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct23
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.97
Minimum0
Maximum95
Zeros296
Zeros (%)74.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q312.5
95-th percentile80
Maximum95
Range95
Interquartile range (IQR)12.5

Descriptive statistics

Standard deviation22.79626118
Coefficient of variation (CV)2.078054802
Kurtosis4.344427362
Mean10.97
Median Absolute Deviation (MAD)0
Skewness2.280664508
Sum4388
Variance519.6695238
MonotonicityNot monotonic
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
0296
74.0%
2019
 
4.8%
8012
 
3.0%
259
 
2.2%
227
 
1.8%
12.57
 
1.8%
405
 
1.2%
905
 
1.2%
304
 
1.0%
454
 
1.0%
Other values (13)32
 
8.0%
ValueCountFrequency (%)
0296
74.0%
12.57
 
1.8%
17.51
 
0.2%
2019
 
4.8%
213
 
0.8%
227
 
1.8%
259
 
2.2%
282
 
0.5%
304
 
1.0%
334
 
1.0%
ValueCountFrequency (%)
953
 
0.8%
905
1.2%
852
 
0.5%
8012
3.0%
751
 
0.2%
703
 
0.8%
603
 
0.8%
553
 
0.8%
52.52
 
0.5%
454
 
1.0%

INDUS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct72
Distinct (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.936425
Minimum0.46
Maximum27.74
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum0.46
5-th percentile2.172
Q15.13
median8.56
Q318.1
95-th percentile19.58
Maximum27.74
Range27.28
Interquartile range (IQR)12.97

Descriptive statistics

Standard deviation6.848042068
Coefficient of variation (CV)0.6261682468
Kurtosis-1.211903009
Mean10.936425
Median Absolute Deviation (MAD)5.33
Skewness0.3233090448
Sum4374.57
Variance46.89568017
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.1103
25.8%
19.5823
 
5.8%
8.1418
 
4.5%
6.215
 
3.8%
3.9711
 
2.8%
10.5911
 
2.8%
21.8910
 
2.5%
9.99
 
2.2%
6.919
 
2.2%
5.198
 
2.0%
Other values (62)183
45.8%
ValueCountFrequency (%)
0.461
 
0.2%
0.741
 
0.2%
1.211
 
0.2%
1.221
 
0.2%
1.252
0.5%
1.381
 
0.2%
1.472
0.5%
1.523
0.8%
1.692
0.5%
1.761
 
0.2%
ValueCountFrequency (%)
27.744
 
1.0%
25.655
 
1.2%
21.8910
 
2.5%
19.5823
 
5.8%
18.1103
25.8%
15.042
 
0.5%
13.924
 
1.0%
13.894
 
1.0%
12.834
 
1.0%
11.935
 
1.2%

CHAS
Categorical

Distinct2
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
0
371 
1
 
29

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters400
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0371
92.8%
129
 
7.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0371
92.8%
129
 
7.2%

Most occurring characters

ValueCountFrequency (%)
0371
92.8%
129
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number400
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0371
92.8%
129
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
Common400
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0371
92.8%
129
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0371
92.8%
129
 
7.2%

NOX
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct80
Distinct (%)20.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5528165
Minimum0.385
Maximum0.871
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum0.385
5-th percentile0.409
Q10.449
median0.532
Q30.624
95-th percentile0.74
Maximum0.871
Range0.486
Interquartile range (IQR)0.175

Descriptive statistics

Standard deviation0.1154880329
Coefficient of variation (CV)0.2089084405
Kurtosis0.006077868001
Mean0.5528165
Median Absolute Deviation (MAD)0.084
Skewness0.7618370193
Sum221.1266
Variance0.01333748574
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.53818
 
4.5%
0.48915
 
3.8%
0.71314
 
3.5%
0.87113
 
3.2%
0.7412
 
3.0%
0.43712
 
3.0%
0.69311
 
2.8%
0.60510
 
2.5%
0.64710
 
2.5%
0.62410
 
2.5%
Other values (70)275
68.8%
ValueCountFrequency (%)
0.3851
 
0.2%
0.3891
 
0.2%
0.3921
 
0.2%
0.3941
 
0.2%
0.3982
0.5%
0.44
1.0%
0.4012
0.5%
0.4033
0.8%
0.4042
0.5%
0.4052
0.5%
ValueCountFrequency (%)
0.87113
3.2%
0.773
 
0.8%
0.7412
3.0%
0.7184
 
1.0%
0.71314
3.5%
0.79
2.2%
0.69311
2.8%
0.6796
1.5%
0.6716
1.5%
0.6683
 
0.8%

RM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct362
Distinct (%)90.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.292165
Minimum4.138
Maximum8.78
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum4.138
5-th percentile5.304
Q15.8775
median6.2085
Q36.6205
95-th percentile7.6947
Maximum8.78
Range4.642
Interquartile range (IQR)0.743

Descriptive statistics

Standard deviation0.7099234861
Coefficient of variation (CV)0.1128265845
Kurtosis1.581516657
Mean6.292165
Median Absolute Deviation (MAD)0.3535
Skewness0.6512418176
Sum2516.866
Variance0.5039913562
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.4053
 
0.8%
6.4173
 
0.8%
6.1673
 
0.8%
6.3762
 
0.5%
6.382
 
0.5%
6.2092
 
0.5%
6.632
 
0.5%
6.0092
 
0.5%
6.1852
 
0.5%
6.1272
 
0.5%
Other values (352)377
94.2%
ValueCountFrequency (%)
4.1382
0.5%
4.3681
0.2%
4.6281
0.2%
4.6521
0.2%
4.881
0.2%
4.9031
0.2%
4.9061
0.2%
4.9261
0.2%
4.9631
0.2%
4.971
0.2%
ValueCountFrequency (%)
8.781
0.2%
8.7251
0.2%
8.7041
0.2%
8.3981
0.2%
8.3751
0.2%
8.3371
0.2%
8.2971
0.2%
8.2661
0.2%
8.2591
0.2%
8.2471
0.2%

AGE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct296
Distinct (%)74.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.086
Minimum2.9
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum2.9
5-th percentile18.37
Q142.375
median76.95
Q393.825
95-th percentile100
Maximum100
Range97.1
Interquartile range (IQR)51.45

Descriptive statistics

Standard deviation28.38688769
Coefficient of variation (CV)0.4169269407
Kurtosis-1.012964048
Mean68.086
Median Absolute Deviation (MAD)20.35
Skewness-0.5732836116
Sum27234.4
Variance805.8153925
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10033
 
8.2%
97.94
 
1.0%
95.44
 
1.0%
98.24
 
1.0%
95.63
 
0.8%
98.93
 
0.8%
36.63
 
0.8%
21.43
 
0.8%
32.23
 
0.8%
98.83
 
0.8%
Other values (286)337
84.2%
ValueCountFrequency (%)
2.91
0.2%
61
0.2%
6.21
0.2%
6.51
0.2%
6.62
0.5%
7.82
0.5%
8.41
0.2%
8.91
0.2%
9.81
0.2%
101
0.2%
ValueCountFrequency (%)
10033
8.2%
99.31
 
0.2%
99.11
 
0.2%
98.93
 
0.8%
98.83
 
0.8%
98.71
 
0.2%
98.51
 
0.2%
98.42
 
0.5%
98.32
 
0.5%
98.24
 
1.0%

DIS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct339
Distinct (%)84.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.81946175
Minimum1.1296
Maximum12.1265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum1.1296
5-th percentile1.439495
Q12.10915
median3.2721
Q35.2146
95-th percentile7.9549
Maximum12.1265
Range10.9969
Interquartile range (IQR)3.10545

Descriptive statistics

Standard deviation2.132444822
Coefficient of variation (CV)0.5583102964
Kurtosis0.6284975576
Mean3.81946175
Median Absolute Deviation (MAD)1.31305
Skewness1.040768718
Sum1527.7847
Variance4.547320918
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.72094
 
1.0%
3.49524
 
1.0%
5.28734
 
1.0%
5.40074
 
1.0%
6.06223
 
0.8%
3.65193
 
0.8%
4.81223
 
0.8%
7.82783
 
0.8%
6.4983
 
0.8%
6.81473
 
0.8%
Other values (329)366
91.5%
ValueCountFrequency (%)
1.12961
0.2%
1.1371
0.2%
1.16911
0.2%
1.17421
0.2%
1.17811
0.2%
1.20241
0.2%
1.31631
0.2%
1.32161
0.2%
1.33251
0.2%
1.34491
0.2%
ValueCountFrequency (%)
12.12651
0.2%
10.71032
0.5%
10.58572
0.5%
9.22291
0.2%
9.22031
0.2%
9.18761
0.2%
9.08921
0.2%
8.90671
0.2%
8.79212
0.5%
8.69661
0.2%

RAD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.4625
Minimum1
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q324
95-th percentile24
Maximum24
Range23
Interquartile range (IQR)20

Descriptive statistics

Standard deviation8.687478025
Coefficient of variation (CV)0.918095432
Kurtosis-0.8265621333
Mean9.4625
Median Absolute Deviation (MAD)2
Skewness1.023992366
Sum3785
Variance75.47227444
MonotonicityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
24103
25.8%
590
22.5%
490
22.5%
333
 
8.2%
820
 
5.0%
617
 
4.2%
217
 
4.2%
117
 
4.2%
713
 
3.2%
ValueCountFrequency (%)
117
 
4.2%
217
 
4.2%
333
 
8.2%
490
22.5%
590
22.5%
617
 
4.2%
713
 
3.2%
820
 
5.0%
24103
25.8%
ValueCountFrequency (%)
24103
25.8%
820
 
5.0%
713
 
3.2%
617
 
4.2%
590
22.5%
490
22.5%
333
 
8.2%
217
 
4.2%
117
 
4.2%

TAX
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct63
Distinct (%)15.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean403.7975
Minimum187
Maximum711
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum187
5-th percentile222
Q1277
median329
Q3666
95-th percentile666
Maximum711
Range524
Interquartile range (IQR)389

Descriptive statistics

Standard deviation169.6568156
Coefficient of variation (CV)0.4201532095
Kurtosis-1.112504552
Mean403.7975
Median Absolute Deviation (MAD)74
Skewness0.7030860974
Sum161519
Variance28783.43508
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
666103
25.8%
30733
 
8.2%
40323
 
5.8%
27711
 
2.8%
30411
 
2.8%
26411
 
2.8%
43710
 
2.5%
2249
 
2.2%
2339
 
2.2%
3988
 
2.0%
Other values (53)172
43.0%
ValueCountFrequency (%)
1871
 
0.2%
1885
1.2%
1937
1.8%
1981
 
0.2%
2165
1.2%
2227
1.8%
2234
1.0%
2249
2.2%
2261
 
0.2%
2339
2.2%
ValueCountFrequency (%)
7114
 
1.0%
666103
25.8%
43710
 
2.5%
4326
 
1.5%
4303
 
0.8%
4221
 
0.2%
4112
 
0.5%
40323
 
5.8%
4022
 
0.5%
3988
 
2.0%

PTRATIO
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct44
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.459
Minimum12.6
Maximum22
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum12.6
5-th percentile14.7
Q117.4
median18.95
Q320.2
95-th percentile21
Maximum22
Range9.4
Interquartile range (IQR)2.8

Descriptive statistics

Standard deviation2.148104953
Coefficient of variation (CV)0.116371686
Kurtosis-0.1532549156
Mean18.459
Median Absolute Deviation (MAD)1.25
Skewness-0.8363241144
Sum7383.6
Variance4.614354887
MonotonicityNot monotonic
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
20.2111
27.8%
14.724
 
6.0%
2123
 
5.8%
17.818
 
4.5%
18.616
 
4.0%
17.415
 
3.8%
19.215
 
3.8%
18.413
 
3.2%
19.112
 
3.0%
1311
 
2.8%
Other values (34)142
35.5%
ValueCountFrequency (%)
12.62
 
0.5%
1311
2.8%
13.61
 
0.2%
14.41
 
0.2%
14.724
6.0%
14.83
 
0.8%
14.94
 
1.0%
15.28
 
2.0%
15.32
 
0.5%
15.51
 
0.2%
ValueCountFrequency (%)
222
 
0.5%
21.210
 
2.5%
2123
 
5.8%
20.97
 
1.8%
20.2111
27.8%
20.14
 
1.0%
19.77
 
1.8%
19.65
 
1.2%
19.215
 
3.8%
19.112
 
3.0%

B
Real number (ℝ≥0)

HIGH CORRELATION

Distinct286
Distinct (%)71.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean359.455375
Minimum0.32
Maximum396.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum0.32
5-th percentile97.484
Q1376.115
median391.575
Q3396.285
95-th percentile396.9
Maximum396.9
Range396.58
Interquartile range (IQR)20.17

Descriptive statistics

Standard deviation86.73290591
Coefficient of variation (CV)0.2412897732
Kurtosis8.565800309
Mean359.455375
Median Absolute Deviation (MAD)5.325
Skewness-3.082984701
Sum143782.15
Variance7522.596967
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
396.998
 
24.5%
395.562
 
0.5%
396.212
 
0.5%
395.632
 
0.5%
396.062
 
0.5%
389.712
 
0.5%
395.242
 
0.5%
394.722
 
0.5%
377.072
 
0.5%
395.112
 
0.5%
Other values (276)284
71.0%
ValueCountFrequency (%)
0.321
0.2%
2.521
0.2%
2.61
0.2%
3.51
0.2%
7.681
0.2%
9.321
0.2%
16.451
0.2%
18.821
0.2%
21.571
0.2%
22.011
0.2%
ValueCountFrequency (%)
396.998
24.5%
396.331
 
0.2%
396.31
 
0.2%
396.281
 
0.2%
396.241
 
0.2%
396.231
 
0.2%
396.212
 
0.5%
396.141
 
0.2%
396.062
 
0.5%
395.931
 
0.2%

LSTAT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct365
Distinct (%)91.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.668525
Minimum1.92
Maximum37.97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum1.92
5-th percentile3.9485
Q16.99
median10.875
Q316.91
95-th percentile27.266
Maximum37.97
Range36.05
Interquartile range (IQR)9.92

Descriptive statistics

Standard deviation7.207046752
Coefficient of variation (CV)0.5688939124
Kurtosis0.4587133226
Mean12.668525
Median Absolute Deviation (MAD)4.515
Skewness0.9507886989
Sum5067.41
Variance51.94152288
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.363
 
0.8%
8.053
 
0.8%
9.52
 
0.5%
18.062
 
0.5%
7.392
 
0.5%
5.682
 
0.5%
13.152
 
0.5%
4.562
 
0.5%
6.722
 
0.5%
17.272
 
0.5%
Other values (355)378
94.5%
ValueCountFrequency (%)
1.921
0.2%
2.471
0.2%
2.881
0.2%
2.941
0.2%
2.961
0.2%
2.971
0.2%
3.011
0.2%
3.111
0.2%
3.131
0.2%
3.161
0.2%
ValueCountFrequency (%)
37.971
0.2%
34.771
0.2%
34.411
0.2%
34.371
0.2%
34.021
0.2%
31.991
0.2%
30.812
0.5%
30.631
0.2%
30.621
0.2%
30.591
0.2%

MEDV
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct205
Distinct (%)51.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.47575
Minimum5
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB

Quantile statistics

Minimum5
5-th percentile10.39
Q117.1
median21
Q325
95-th percentile43.81
Maximum50
Range45
Interquartile range (IQR)7.9

Descriptive statistics

Standard deviation9.218611279
Coefficient of variation (CV)0.4101581162
Kurtosis1.562946674
Mean22.47575
Median Absolute Deviation (MAD)4
Skewness1.132726221
Sum8990.3
Variance84.98279392
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5013
 
3.2%
226
 
1.5%
255
 
1.2%
15.65
 
1.2%
21.75
 
1.2%
23.15
 
1.2%
21.25
 
1.2%
23.95
 
1.2%
19.45
 
1.2%
17.85
 
1.2%
Other values (195)341
85.2%
ValueCountFrequency (%)
52
0.5%
5.61
0.2%
6.31
0.2%
71
0.2%
7.22
0.5%
7.41
0.2%
7.51
0.2%
8.32
0.5%
8.41
0.2%
8.52
0.5%
ValueCountFrequency (%)
5013
3.2%
48.81
 
0.2%
48.31
 
0.2%
46.71
 
0.2%
461
 
0.2%
45.41
 
0.2%
44.81
 
0.2%
441
 
0.2%
43.81
 
0.2%
43.11
 
0.2%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
00.955770.08.1400.5386.04788.84.45344307.021.0306.3817.2814.8
10.0287528.015.0400.4646.21128.93.66594270.018.2396.336.2125.0
21.223580.019.5800.6056.94397.41.87735403.014.7363.434.5941.3
35.666370.018.1000.7406.219100.02.004824666.020.2395.6916.5918.4
40.045440.03.2400.4606.14432.25.87364430.016.9368.579.0919.8
50.1065980.01.9100.4135.93619.510.58574334.022.0376.045.5720.6
651.135800.018.1000.5975.757100.01.413024666.020.22.6010.1115.0
73.321050.019.5810.8715.403100.01.32165403.014.7396.9026.8213.4
81.053930.08.1400.5385.93529.34.49864307.021.0386.856.5823.1
90.245220.09.9000.5445.78271.74.03174304.018.4396.9015.9419.8

Last rows

CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
3909.966540.018.1000.7406.485100.01.978424666.020.2386.7318.8515.4
39114.050700.018.1000.5976.657100.01.527524666.020.235.0521.2217.2
3920.114320.08.5600.5206.78171.32.85615384.020.9395.587.6726.5
3930.590050.021.8900.6246.37297.92.32744437.021.2385.7611.1223.0
3940.068600.02.8900.4457.41662.53.49522276.018.0396.906.1933.2
3950.0361580.04.9500.4116.63023.45.11674245.019.2396.904.7027.9
3960.175050.05.9600.4995.96630.23.84735279.019.2393.4310.1324.7
3976.654920.018.1000.7136.31783.02.734424666.020.2396.9013.9919.5
3980.131170.08.5600.5206.12785.22.12245384.020.9387.6914.0920.4
3990.0646670.02.2400.4006.34520.17.82785358.014.8368.244.9722.5